[tune] Change the log syncing behavior#4450
Conversation
|
Can one of the admins verify this patch? |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
|
Test PASSed. |
doc/source/tune-usage.rst
Outdated
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed with ``resume=True``. The default setting of ``resume=False`` creates a new experiment, and ``resume="prompt"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name. | ||
| Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed by passing one of True, False, "LOCAL", "REMOTE", or "PROMPT" to ``tune.run(resume=...)``. The default setting of ``resume=False`` creates a new experiment. ``resume="LOCAL"`` and ``resume=True`` restore the experiment from ``local_dir/[experiment_name]``. ``resume="REMOTE"`` syncs the upload dir down to the local dir and then restore the experiment from ``local_dir/experiment_name``. ``resume="PROMPT"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name. |
There was a problem hiding this comment.
| Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed by passing one of True, False, "LOCAL", "REMOTE", or "PROMPT" to ``tune.run(resume=...)``. The default setting of ``resume=False`` creates a new experiment. ``resume="LOCAL"`` and ``resume=True`` restore the experiment from ``local_dir/[experiment_name]``. ``resume="REMOTE"`` syncs the upload dir down to the local dir and then restore the experiment from ``local_dir/experiment_name``. ``resume="PROMPT"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name. | |
| Tune automatically persists the progress of your experiments, so if an experiment crashes or is otherwise cancelled, it can be resumed by passing one of True, False, "LOCAL", "REMOTE", or "PROMPT" to ``tune.run(resume=...)``. The default setting of ``resume=False`` creates a new experiment. ``resume="LOCAL"`` and ``resume=True`` restore the experiment from ``local_dir/[experiment_name]``. ``resume="REMOTE"`` syncs the upload dir down to the local dir and then restores the experiment from ``local_dir/experiment_name``. ``resume="PROMPT"`` will cause Tune to prompt you for whether you want to resume. You can always force a new experiment to be created by changing the experiment name. |
|
Looks good to me! |
|
Test PASSed. |
|
Awesome! |
python/ray/tune/tune.py
Outdated
| else: | ||
| logger.info("Tip: to resume incomplete experiments, " | ||
| "pass resume='prompt' or resume=True to run()") | ||
| def _get_resume_path(local_checkpoint_dir, remote_checkpoint_dir): |
python/ray/tune/trial_runner.py
Outdated
| def _validate_resume(self, resume_type): | ||
| """ | ||
| Args: | ||
| resume_type: One of "REMOTE", "LOCAL", "PROMPT". |
There was a problem hiding this comment.
| resume_type: One of "REMOTE", "LOCAL", "PROMPT". | |
| resume_type: One of "REMOTE", "LOCAL", True, "PROMPT". |
python/ray/tune/trial_runner.py
Outdated
| self._metadata_checkpoint_dir = metadata_checkpoint_dir | ||
| self._local_checkpoint_dir = local_checkpoint_dir | ||
|
|
||
| # TODO(rliaw): This may fail |
There was a problem hiding this comment.
| # TODO(rliaw): This may fail |
python/ray/tune/syncer.py
Outdated
| Args: | ||
| local_dir: Source directory for syncing. | ||
| remote_dir: Target directory for syncing. If None, | ||
| returns NoopSyncer. |
There was a problem hiding this comment.
| returns NoopSyncer. | |
| returns BaseSyncer with a noop. |
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
What do these changes do?
Refactor the log sync behavior.
TODOs:
Fix up Trial syncing (remove remote capabilities from Trial?)
sync_functionfor remote to driver syncing. This may require a bit of restructuring to LogSyncing as a mixin.Write Tests:
test_cluster.py). - Punting on this one because it is captured in e2e ft test.os.path.expanduseron remote node (this is done in [tune] Later expansion of local_dir #4806)Docs
DeprecationWarning
Tests
Related issue number
@richardliaw